Method and system for efficiently loading primitives into processors of a graphics system

ABSTRACT

A method and system for more efficiently loading a plurality of primitives for a scene into processors of a computer graphics system is disclosed. Each primitive has a top and a bottom. The primitives are ordered based on the top of each primitive. The system and method include providing at least one input, a merge circuit, a distributor, a feedback circuit and a controller. The input(s) is for receiving data relating to each primitive. The merge circuit is coupled with the input(s) and adds the data for a primitive having a top not lower than a current line. The distributor is coupled with the feedback circuit, eliminates an expired primitive and outputs the data for remaining primitives after the expired primitive has been removed. The expired primitive has a bottom above the current line. The feedback circuit is coupled to the merge circuit and the distributor and re-inputs to the merge circuit the data for the remaining primitives. The controller controls the feedback circuit, the distributor and the merge circuit.

FIELD OF THE INVENTION

The present invention relates to computer graphics system, and moreparticularly to a method and system for more efficiently loadingprimitives into processors for a computer graphics system.

BACKGROUND OF THE INVENTION

A conventional computer graphics system can display graphical images ofobjects on a display. The display includes a plurality of displayelements, known as pixels, typically arranged in a grid. In order todisplay objects, the conventional computer graphics system typicallybreaks each object into a plurality of polygons, termed primitives. Aconventional system then renders the primitives in a particular order.

Some computer graphics systems are capable of rendering the primitivesin raster, order. Such as system is described in U.S. Pat. No. ______,entitled “______” and assigned to the assignee of the presentapplication. In such a system, all of the primitives intersecting aparticular pixel are rendered for that pixel. The primitivesintersecting a next pixel in the line are then rendered. Typically, thisprocess proceeds from left to right in the line until the line has beenrendered, then recommences on the next line. The frame is rendered lineby line, until the frame has been completed.

In order to render the frame, the primitives are loaded into processors.Typically, all of the primitives starting at a particular line areloaded into the processors at the start of the line. After the line hascompleted processing, primitives which have expired are ejected. Anexpired primitive is one which can not be present on the next line. Inother words, an expired primitive has a bottom that is no lower than theline that was just processed. Any new primitives for the next line areloaded at the start of the next line. The line is then processed asdescribed above. This procedure continues until the frame is rendered.

Although the system is capable of rendering primitives in raster order,one of ordinary skill in the art will readily recognize that theprocesses of loading primitives and ejecting expired primitives eachconsume time and resources. In addition, in a complex scene, manyprimitives might expire at the end of a particular line and a largenumber of primitives might start at the next line. Ejecting the expiredprimitives and loading the new primitives might cause a significantdelay in the pipeline. Furthermore, the primitives are all loaded intoand processed by the processors. Thus, the number of primitives capableof being processed at a particular pixel is limited by the number ofprocessors in the system. Typically, the number of processors is on theorder of sixteen or thirty two. As a result, the number of primitivesthat overlap at a particular pixel and that can be processed is limitedto sixteen or thirty two. The complexity of the frame is therebylimited. This limitation can be improved by increasing the number ofprocessors. However, increasing the number of processors increases thespace consumed by the graphics system, which is undesirable.

Accordingly, what is needed is a system and method for more efficientlyloading primitives into the processors. The present invention addressessuch a need.

SUMMARY OF THE INVENTION

The present invention provides a method and system for more efficientlyloading a plurality of primitives for a scene into a pluralityprocessors of a computer graphics system. Each of the plurality ofprimitives has a top and a bottom. The plurality of primitives isordered based on the top of each of the plurality of primitives. Thesystem and method comprise providing at least one input, a mergecircuit, a distributor, a feedback circuit and a controller. The atleast one input is for receiving data relating to each of the pluralityof primitives. The merge circuit is coupled with the input and is foradding the data for a primitive having a top that is not lower than acurrent line. The distributor is coupled with the feedback circuit. Thedistributor eliminates an expired primitive and outputs the data for aremaining portion of the primitives after the expired primitive has beenremoved. The expired primitive has a bottom that is above the currentline. The feedback circuit is coupled to the merge circuit and thedistributor and re-inputs to the merge circuit the data for theremaining portion of the plurality of primitives. The controllercontrols the feedback circuit, the distributor and the merge circuit.

According to the system and method disclosed herein, the presentinvention provides a more efficient mechanism for loading primitives.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a computer system including a graphicssystem.

FIG. 2 is a diagram of a portion of a display including a plurality ofprimitives rendered for a frame.

FIG. 3 is a flow chart a method for loading and evicting primitives fromprocessors.

FIG. 4 is a block diagram of one embodiment of a computer graphicssystem using one embodiment of a system in accordance with the presentinvention.

FIG. 5 is a block diagram of one embodiment of a system in accordancewith the present invention for more efficiently loading primitives intoprocessors in a computer graphics system.

FIG. 6 is a high-level flow chart of one embodiment of a method inaccordance with the present invention for more efficiently loadingprimitives into processors in a computer graphics system.

FIGS. 7A and 7B depict a more detailed flow chart of one embodiment of amethod in accordance with the present invention for more efficientlyloading primitives into processors in a computer graphics system.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to an improvement in computer graphicssystem. The following description is presented to enable one of ordinaryskill in the art to make and use the invention and is provided in thecontext of a patent application and its requirements. Variousmodifications to the preferred embodiment will be readily apparent tothose skilled in the art and the generic principles herein may beapplied to other embodiments. Thus, the present invention is notintended to be limited to the embodiment shown, but is to be accordedthe widest scope consistent with the principles and features describedherein.

FIG. 1 is a block diagram of a computer system 10 including a computergraphics system 20. The computer system 10 also includes a centralprocessing unit 12, a display 14, a user interface 16 such as a keyboardand/or mouse and a memory 18. The graphics system 20 is depicted asincluding an internal memory 22 and processors 24 that are coupled by abus 23. The graphics system 20 typically has other components that arenot shown for clarity.

FIG. 2 depicts a portion of the display 14. The display 14 includes aplurality of pixels. For clarity, only one pixel 15 is depicted. On thedisplay are depicted primitives 30, 40 and 50. The primitives 30, 40 and50 are typically part of a scene containing many primitives. Theprimitives in the scene may also overlap, as is shown in area 55 forprimitives 30 and 40.

Referring to FIGS. 1 and 2, in order to render a scene on the display14, the graphics system 20 must render the polygons. In a graphicssystem 20 described in U.S. Pat. No. ______, entitled “______” andassigned to the assignee of the present application, the graphics system20 renders the primitives 30, 40 and 50 in raster order. In other words,the graphics system 20 renders a scene pixel by pixel in raster order.Thus, in the area 55 where primitives 30 and 40 overlap, two primitivesare rendered for each pixel. In order to render the scene, data for theprimitives 30, 40 and 50 must be loaded from the internal memory 22 tothe processors 24.

FIG. 3 depicts a high level flow chart of a method 60 for loadingprimitives used in the above-mentioned U.S patent. At the start of theline, new primitives for the line are loaded into the processors 24, viastep 62. The primitives are loaded from the internal memory 22 to theprocessors 24 Thus, primitives which commenced at a previous line andwhich will contribute to the current line remain in the processors 24.The line is then processed, via step 64. Step 64 may include performinginterpolation, texture processing, antialiasing or other operations usedin rendering the scene. It is determined whether processing of the lineis complete, via step 66. If not, then processing continues in step 64.If the line is completed, then the primitives that have expired areevicted from some or all of the processors 24, via step 68. A primitivethat has expired cannot contribute to the next line and thus has abottom that is no lower than the current line being processed. The newline is then commenced, via step 70. Any new primitives are then loaded,via step 62. The method 60 thus repeats until the frame has beenrendered.

Although the method and system shown in FIGS. 1 and 3 function, one ofordinary skill in the art will readily realize that there arelimitations. Loading primitives in the processors 24 in step 62 requirestime. Similarly, evicting primitives from the processor 24 in step 68requires time. If a certain line differs significantly from a previousline, the number of primitives evicted and loaded may be quite large.This is particularly true if the bus 23 does not have sufficientthroughput. As a result, the time required to perform steps 62 and 68becomes significant, delaying completion of the frame by the graphicssystem 20. Furthermore, the number of primitives that can be processedfor a particular pixel in a line is limited by the number of primitivesthat can be loaded into the processor 24. This number is the same as thenumber of processors 24, which is typically sixteen or thirty-two. Thus,the complexity of the scene that can be rendered is also limited.Although increasing the number of processors 24 addresses this problem,the space consumed by the graphics system 20 will also increased. Suchan increase in space is undesirable.

The present invention provides a method and system for more efficientlyloading a plurality of primitives for a scene into a pluralityprocessors of a computer graphics system. Each of the plurality ofprimitives has a top and a bottom. The plurality of primitives isordered based on the top of each of the plurality of primitives. Thesystem and method comprise providing at least one input, a mergecircuit, a distributor, a feedback circuit and a controller. The atleast one input is for receiving data relating to each of the pluralityof primitives. The merge circuit is coupled with the input and is foradding the data for a primitive having a top that is not lower than acurrent line. The distributor is coupled with the feedback circuit. Thedistributor eliminates an expired primitive and outputs the data for aremaining portion of the primitives after the expired primitive has beenremoved. The expired primitive has a bottom that is above the currentline. The feedback circuit is coupled to the merge circuit and thedistributor and re-inputs to the merge circuit the data for theremaining portion of the plurality of primitives. The controllercontrols the feedback circuit, the distributor and the merge circuit.

The present invention will be described in terms of a particularcomputer'system, a particular computer graphics system and a particularset of processors. However, one of ordinary skill in the art willreadily recognize that this method and system will operate effectivelyfor other computer system, other computer graphics systems, and othernumbers of processors.

To more particularly illustrate the method and system in accordance withthe present invention, refer now to FIG. 4, depicting one embodiment ofa computer graphics system 100 using one embodiment of a system inaccordance with the present invention. The computer graphics system 100is preferably used in the computer system 10 in place of the computergraphics system 20. The computer graphics system 100 includes a system130 in accordance with the present invention for more efficientlyloading primitives into processors in the computer graphics system 100.The system 130 is termed herein a y-loop system 130. The computergraphics system 100 also includes an internal memory 100, a processorblock 120 and additional processing circuitry 122. The additionalprocessing circuitry 122 could include one or more interpolators,sorters, antialiasing units and other circuitry actually used inrendering the frame. Some embodiments of the additional processingcircuitry 122 are described in the above-mentioned U.S patent. Theinternal memory 110 is preferably a random access memory (“RAM”) 110.Data for the primitives are preferably loaded into the RAM 110. Thisdata preferably includes an identifier for each primitive, the top andbottom coordinates for each primitive and can include texture, color, orother data used in processing the primitive.

FIG. 5 is a block diagram of one embodiment of the system 130 inaccordance with the present invention for more efficiently loadingprimitives into processors in a computer graphics system, such as thecomputer graphics system 100. The y-loop 130 includes a y-loop merge140, a y-loop distributor 150, a controller 160 and y-loop feedback 170.The feedback 170 is preferably a first-in-first-out buffer (FIFO) 170.In a preferred embodiment, the feedback FIFO 170 preferably has a depththat is the same as the number of virtual processors in the graphicssystem 100, preferably, one thousand and twenty-four. In a preferredembodiment, the number of actual, physical processors in the processorblock 24 is different from the number of virtual processors. The numberof virtual processors is set by the number of processors the graphicssystem 100 appears to have because of the configuration and functions ofthe processors actually used. Because the graphics system 100 has onethousand and twenty-four virtual processors, this is the upper limit ofprimitives that can be processed for a particular pixel. The actualprocessors in the processor block 120 is substantially less. In apreferred embodiment, there are sixteen processors in the processorblock 120. However, nothing prevents the use of another number ofprocessors and/or another number of virtual processors.

The y-loop merge 140 is used to merge data for new primitives with datafor primitives that have been fed back through the feedback FIFO 170, asdiscussed below. The y-loop merge includes a compare block 142 and amerge block 144. The primitives input to the y-loop merge 140 areordered based on y-values, or height in the frame. Thus, the primitivesinput to the y-loop merge 140 are preferably ordered based on theposition of their top, shown as y-top 132 in FIG. 5. The data input tothe y-loop merge 140 includes the y-top 132 (top y-value of theprimitive), the y-bot 134 (bottom y-value of the primitive), the index136 which identifies the primitive, the primitive type 137 and thetop-bot-is-left 138. The primitive type 137 is either the number of theprimitive, an empty primitive if the frame is empty or the end of theframe if the primitive is the last in the frame. The top-bot-is-left 138indicates the orientation of the primitive being rendered. Thus,top-bot-is-left 138 indicates whether the side of the primitive whichconnects the top vertex and bottom vertex is on the left or rightboundary. Other data for the primitive, such as the color or texturevalues remain stored in the internal memory 110. The y-loop merge 140merges data input for new primitives that start at a current line withdata for primitives that have been looped through the y-loop merge 140using the merge block 144. Thus, the y-loop merge 140 also has as aninput the current line 168. In order to determine whether to accept anew primitive, the y-loop merge 140 uses the compare block 142 tocompare the y-top 132 with the current line 168. If the y-top 132 is notless than the current line 168, then the y-loop merge 140 accepts theprimitive and provides the data to the merge block 144.

The distributor 150 is coupled to and receives data from the y-loopmerge 140. The data received includes the y-bot 134, the index 136, theprimitive type 137 and the top-bot-is left 138. The distributor 150includes a compare block 152 and a distribute block 154. The distributor150 evicts primitives that have expired and distributes data forprimitives that have not expired. To do so, the distributor 150 uses thecompare block 152 to compare the bottom of each primitive with thecurrent line and provides an evict signal 156 to the distribute block154. The evict signal 156 indicates whether to evict a particularprimitive. If the bottom of the primitive, the y-bot 134, is less thanthe current line then the primitive will be evicted.

The distribute block 154 provides data for primitives that are notevicted to two components. First, the distribute block 154 outputs theindex 136 to the processors 120 (not shown in FIG. 5), via output 180.The distribute block 154 also preferably outputs the primitive type 137and the top-bot-is left 138 to the processors 120. The current line 168is also provided to the processors 120 from the controller 160. Thus,the processors 120 receive data for the primitives.

The distribute block 154 also feeds back data to the feedback FIFO 170,as well as providing the data to the control 160 through lines 182. Thefeedback FIFO 170 is thus coupled both to the output of the distributor150 and to the input of the y-loop merge 140. Because it preserves theorder of the data that was provided to it, the feedback FIFO 170 willretain the ordering of the primitives, from top to bottom. In addition,the feedback FIFO 170 will feed data for primitives which have notexpired back to the input of the y-loop 130.

FIG. 6 is a high-level flow chart of one embodiment of a method 200 inaccordance with the present invention for more efficiently loadingprimitives into processors 120 in a computer graphics system 100. Themethod 200 preferably uses the system 100. Consequently, the method 200is described in the context of the computer system 100 and the graphicssystem 100.

It is determined whether any new primitives start on the current lineusing the y-loop merge 140, via step 202. Preferably step 202 isperformed by determining whether the top of the primitive, as determinedby the y-top 132, is less than or equal to the current line 168. If anynew primitives commence on the current line, then they are merged withdata for certain previous primitives using the merge block 144, via step204. Using the distributor 150, it is determined whether any of theprimitives have expired, via step 206. Step 206 is preferably performedby determining whether the bottom of the primitive, as determined by they-bot 134, is less than the current line 168. If so, then the expiredprimitives are ejected, via step 208. If no primitives have expired oronce the expired primitives have been ejected, the remaining primitivesare output to the processors 120 and fed back to the feedback FIFO 170,via step 210. The method 200 may then repeat until the frame has beenrendered. Thus, primitives which will contribute to the frame for aparticular number lines will be looped through the feedback FIFO 170 theparticular number times. The primitive need not be reloaded into they-loop 130.

Using the y-loop 130 and the method 200, primitives can be continuouslyloaded and ejected. As a result, any delays at the end of a line due toejecting and loading of primitives can be reduced or eliminated. Thus,loading of primitives to the processors 120 in the graphics system 100can be made more efficient. Furthermore, because the feedback FIFO 170can hold data for a large number of primitives, the y-loop 130 can beused with a large number of virtual (or actual) processors. This featureallows more primitives to overlap a single pixel. Consequently,limitations in the complexity of the scene are reduced.

FIGS. 7A and 7B depict a more detailed flow chart of one embodiment of amethod 210 in accordance with the present invention for more efficientlyloading primitives into processors in a computer graphics system. Themethod 220 preferably uses the system 100. Consequently, the method 220is described in the context of the graphics system 100, the y-loop 130in FIGS. 3 and 4.

The method 220 commences by setting the current line value 168 to thetop of the frame, and setting the read and write addresses for thefeedback FIFO 170 to the start of the feedback FIFO 170, via step 222.Step 222 is performed once per frame. It is determined whether a newprimitive in a FIFO (not shown) connected to the y-loop merge 140 isready, via step 224. The FIFO holds the primitives to be rendered inorder from lowest to highest y-top 132. If a new primitive is ready,then for the new primitive, it is determined whether the y-bot 134 isless than the current line 168, via step 230. If so, then because theprimitive actually ends above the current line, then the primitive isrejected, via step 228. The method 220 would then return to step 224. Ifthe y-bot 134 is not less than the current line 168, then using thecompare block 142 it is determined whether y-top 132 is less than orequal to the current line 168, via step 236. If so, then the newprimitive starts at least at the current line, so the primitive is readinto the y-loop merge 140 from the FIFO which is connect to the y-merge140, via step 234. The new primitive would then be output to theprocessors using the distribute block 154 of the distributor 150, viastep 242.

If it is determined in step 232 that the y-top 132 is not less than orequal to the current line 168, then it is determined whether thefeedback FIFO 170 is empty, via step 236. If not, then the primitive isread from the feedback FIFO 170 into the y-merge 140, via step 238. Theprimitive would then be output by the distributor, via step 238.

If it is determined in step 236 that the feedback FIFO 170 is empty,then it is determined whether any primitive were processed for thecurrent line 168, via step 240. If not, then the empty line is output,via step 244. If there were primitives during the current line 168 oronce the empty line is output, it is determined whether the current line168 is the bottom line, via step 246. If so, then the input stream isflushed and the current line is set to the top line, via step 248. Themethod 220 could then start again for the new frame. If the current line168 is not the bottom of the frame, then the current line is incrementedand the address in the feedback FIFO 170 from which data is read isincremented, via step 250. Step 224 would then be returned to.

If it is determined in step 224 that a new primitive is not ready to beloaded, then it is determined whether the feedback FIFO 170 has morethan one entry, via step 230. If not, then the method returns to step224. If so, then the primitive(s) from the feedback FIFO 170 are read,via step 238. The primitive would then be output in step 242.

Once either from the feedback FIFO 170 or a new primitive is output instep 242, using the compare block 152, it is determined whether the lineafter the current line is below the bottom line of the primitive, viastep 252. If so, then primitive is evicted via step 254. Otherwise, theprimitive is provided to the feedback FIFO 170 using the distributor154.

Thus, the primitives are provided to the processors 120 through they-loop 140. Using the y-loop 130 and the method 220, primitives can becontinuously loaded and ejected. Primitives that contribute to multiplelines of a scene are looped through the y-loop 140 using the feedbackFIFO 170, while primitives which have expired are evicted using thedistributor 150. As a result, delays at the end of a line due toejecting and loading of primitives can be reduced or eliminated. Thus,loading of primitives to the processors 120 in the graphics system 100can be made more efficient. Furthermore, because the FIFO 170 can holddata for a large number of primitives, the y-loop 130 can be used with alarge number of virtual (or actual) processors. This feature allows moreprimitives to overlap a single pixel. Consequently, limitations in thecomplexity of the scene are reduced.

A method and system has been disclosed for more efficiently loadingprimitives into processors for a graphics system. Software writtenaccording to the present invention is to be stored in some form ofcomputer-readable medium, such as memory, CD-ROM or transmitted over anetwork, and executed by a processor. Consequently, a computer-readablemedium is intended to include a computer readable signal which, forexample, may be transmitted over a network. Although the presentinvention has been described in accordance with the embodiments shown,one of ordinary skill in the art will readily recognize that there couldbe variations to the embodiments and those variations would be withinthe spirit and scope of the present invention. Accordingly, manymodifications may be made by one of ordinary skill in the art withoutdeparting from the spirit and scope of the appended claims.

1. A system for more efficiently loading a plurality of primitives for ascene into a plurality processors of a computer graphics system, each ofthe plurality of primitives having a top and a bottom, the plurality ofprimitives being ordered based on the top of each of the plurality ofprimitives, the system comprising: a merge circuit for receiving datarelating to each of the plurality of primitives adding the data for aprimitive having a top that is not lower than a current line; adistributor, coupled with the feedback circuit, for eliminating anexpired primitive, the expired primitive having a bottom that is abovethe current line and for outputting at least a portion of the data for aremaining portion of the primitives after the expired primitive has beenremoved, the at least a portion of the data output by the distributorcontrolling loading of the plurality of primitives by the plurality ofprocessors; a feedback circuit, coupled to the merge circuit and thedistributor, for re-inputting to the merge circuit the data for theremaining portion of the plurality of primitives; and a controller forcontrolling the feedback circuit, the distributor and the merge circuit.2. The system of claim 1 wherein the feedback circuit further includes afirst inn first out (“FIFO”) buffer.
 3. The system of claim 1 whereineach of the plurality of primitives includes a y-top that marks the topof each of the plurality of primitives and wherein the merge circuitcompares the y-top for a primitive of the plurality of primitives to acurrent y-value for the current line and merges the primitive if they-top is not greater than the current line.
 4. The system of claim 1wherein each of the plurality of primitives includes a y-bottom thatmarks at a particular line the bottom of each of the plurality ofprimitives and wherein the distributor circuit compares the y-value fora primitive of the plurality of primitives to a next line-y-value forthe current line and discards the primitive if the y-bottom is notgreater than the a next line y-value.
 5. The system of claim 1 whereinthe at least a portion of the data for each of the plurality ofprimitives is an identifier for each of the plurality of primitives. 6.A method for more efficiently loading a plurality of primitives for ascene into a plurality processors of a computer graphics system, each ofthe plurality of primitives having a top and a bottom, the plurality ofprimitives being ordered based on the top of each of the plurality ofprimitives, the method comprising the steps of: (a) determining whetherthe top of at least one new primitive of the plurality of primitives isnot lower than a current line; (b) merging data for the at least one newprimitive if the top is not lower than the current line; (c) eliminatingan expired primitive and outputting at least a portion of data for aremaining portion of the primitives after the expired primitive has beenremoved, the expired primitive having a bottom that is above the currentline, the data output by the distributor controlling loading of theplurality of primitives by the plurality of processors; (d) forre-inputting to the merge circuit data for the remaining portion of theplurality of primitives.
 7. The method of claim 6 wherein each of theplurality of primitives includes a y-top that marks a top of each of theplurality of primitives and wherein the determining step (a) furtherincludes the step of: (a1) comparing the y-top for a primitive of theplurality of primitives to a current y-value for the current line andwherein the merging step (b) further includes the step of (b1) mergingthe primitive if the y-top is not greater than the current line.
 8. Themethod of claim 6 wherein each of the plurality of primitives includes ay-bottom that marks at a particular line the bottom of each of theplurality of primitives and wherein the eliminating step (c) furtherincludes the steps of: (c1) comparing the y-value for a primitive of theplurality of primitives to a next line y-value for the current line and(c2) discarding the primitive if the y-bottom is not greater than thenext line y-value.
 9. The method of claim 6 wherein the at least aportion of the data for each of the plurality of primitives is anidentifier for each of the plurality of primitives.
 10. The method ofclaim 6 wherein the computer graphics system further includes aninternal memory, and wherein the method further includes the steps of:(e) continuously loading the plurality of primitives into the internalmemory; and (f) providing a primitive of the plurality of primitives toa processor of the plurality of processors only if the distributoroutputs the data for the primitive.